Skip to content

(improvement) Cython row_parser: cache ParseDesc for prepared statements#9

Open
mykaul wants to merge 11 commits into
masterfrom
perf/cache-parse-desc
Open

(improvement) Cython row_parser: cache ParseDesc for prepared statements#9
mykaul wants to merge 11 commits into
masterfrom
perf/cache-parse-desc

Conversation

@mykaul

@mykaul mykaul commented Mar 13, 2026

Copy link
Copy Markdown
Owner

Summary

  • Cache the ParseDesc object constructed in recv_results_rows() so repeated executions of the same prepared statement skip list comprehensions, ColDesc construction, and make_deserializers() calls
  • Cache keyed by id(column_metadata) with identity verification on hit — safe for prepared statements where result_metadata is stored on PreparedStatement and reused
  • Includes clear_parse_desc_cache() function for testing

Benchmark results (median, pytest-benchmark)

ParseDesc construction only

Columns Before (original) After (with cache)
5 cols 3,966 ns 191 ns
10 cols 5,730 ns 175 ns
20 cols 9,266 ns 166 ns
50 cols 19,388 ns 193 ns

Full pipeline (ParseDesc + row parsing)

Scenario Before (original) After (with cache)
1 row × 10 col 6,674 ns 1,418 ns
100 rows × 5 col 48,814 ns 43,825 ns
1000 rows × 5 col 449,386 ns 430,058 ns

For single-row lookups (common with prepared statements), ParseDesc construction is a large fraction of the total response-path cost. Caching eliminates it entirely after the first execution.

All 116 unit tests pass (1 skipped — pre-existing test_datetype issue).

@mykaul mykaul changed the title (improvement) row_parser: cache ParseDesc for prepared statements (improvement) Cython row_parser: cache ParseDesc for prepared statements Mar 13, 2026
@mykaul mykaul force-pushed the perf/cache-parse-desc branch 2 times, most recently from ee3c279 to 9f2f3ff Compare March 15, 2026 16:45
sylwiaszunejko and others added 11 commits March 17, 2026 23:47
Introduce the data layer for Private Link client routes support:

- ClientRoutesChangeType enum for CLIENT_ROUTES_CHANGE event types
- ClientRouteProxy dataclass and ClientRoutesConfig for user-facing
  configuration
- _Route frozen dataclass for immutable route records
- _RouteStore for thread-safe route storage with atomic update/merge
  and preferred route selection that avoids unnecessary connection_id
  migration when multiple routes exist for the same host
Add _ClientRoutesHandler which manages the full lifecycle of dynamic
address translation via system.client_routes:

- initialize(): loads all routes at startup and on control connection
  reconnect
- handle_client_routes_change(): processes CLIENT_ROUTES_CHANGE events
  with targeted merge or full refresh depending on event data
- _query_all_routes_for_connections(): complete refresh query using
  connection_id IN (...)
- _query_routes_for_change_event(): targeted query grouping by
  connection_id with host_id IN (...) per group
- _execute_routes_query(): common query execution and result parsing
  with proxy address override support
- resolve_host(): host_id to (address, port) resolution with DNS lookup
- ClientRoutesEndPointFactory: creates endpoints from system.peers rows
  by extracting host_id, deferring address translation and DNS resolution
  until connection time
- ClientRoutesEndPoint: endpoint that resolves via _ClientRoutesHandler
  on each connection attempt, ensuring immediate reaction to route changes
  and CLIENT_ROUTES_CHANGE events
Cluster:
- Add client_routes_config parameter with mutual exclusivity check
  against endpoint_factory
- Create _ClientRoutesHandler and ClientRoutesEndPointFactory when
  client_routes_config is provided

ControlConnection:
- Register CLIENT_ROUTES_CHANGE event watcher when handler is present
- Forward events to handler via _handle_client_routes_change
- Trigger full route re-read on control connection reconnection
Cover ClientRouteEntry/ClientRoutesConfig validation, _RouteStore
get/merge operations, _ClientRoutesHandler initialization,
ClientRoutesEndPoint resolution with and without route mappings,
and SSL check_hostname rejection with client_routes_config.
Add comprehensive integration tests covering:
- TCP proxy and NLB emulator infrastructure for simulating
  private link connectivity
- query_routes filtering with different connection/host ID combinations
- Full private-link connectivity verifying all driver connections
  go exclusively through the NLB proxy
- Dynamic route updates via REST API with driver reconnection
  through new proxy ports
Cache the ParseDesc object constructed in recv_results_rows() so that
repeated executions of the same prepared statement skip the list
comprehensions, ColDesc construction, and make_deserializers() call.

The cache is keyed by id(column_metadata). For prepared statements the
result_metadata list is stored on PreparedStatement and reused, so id()
is stable. On cache hit we verify object identity (cached_ref is
column_metadata) and that session-level settings (column_encryption_policy,
protocol_version) still match.

Implementation details:
- _get_or_build_parse_desc: cached path, used only when column_metadata
  comes from result_metadata (prepared statements with stable id()).
- _build_parse_desc: uncached path, used for inline metadata from
  non-prepared queries that creates a fresh list every execution.
- Cache is bounded to 256 entries; cleared entirely when full.
- Returns only (desc, column_names, column_types) to the caller,
  avoiding exposure of internal cache fields.
- Thread-safety: dict get/set are atomic in CPython (GIL), but
  concurrent cache misses may cause redundant construction (benign).

A clear_parse_desc_cache() and get_parse_desc_cache_size() function
are exposed for testing.

## Benchmark results (median, pytest-benchmark)

### ParseDesc construction only (reference benchmarks)

| Columns | **Before** (original) | **After** (with cache) |
|---------|-----------------------|------------------------|
| 5 cols  | 3,966 ns              | 191 ns                 |
| 10 cols | 5,730 ns              | 175 ns                 |
| 20 cols | 9,266 ns              | 166 ns                 |
| 50 cols | 19,388 ns             | 193 ns                 |

### Full pipeline integration (recv_results_rows through Cython)

| Scenario          | **Before** (original) | **After** (with cache) |
|-------------------|-----------------------|------------------------|
| 1 row x 10 col    | 40,867 ns             | 2,977 ns               |
| 100 rows x 5 col  | 145,584 ns            | 73,206 ns              |
| 1000 rows x 5 col | 1,099,825 ns          | 999,517 ns             |

For small result sets (single-row lookups common with prepared statements),
ParseDesc construction is a large fraction of the total response-path cost.
Caching eliminates it entirely after the first execution.

All 623 unit tests pass (16 skipped - pre-existing).
- Fix copyright header in benchmark file (DataStax -> ScyllaDB)
- Add pytest.importorskip guard for pytest-benchmark in benchmark file
- Add unit tests for ParseDesc cache under tests/unit/cython: cache hit,
  miss, protocol version invalidation, clear, bounded eviction, correctness
@mykaul mykaul force-pushed the perf/cache-parse-desc branch from 9f2f3ff to 203080e Compare March 20, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants